Conversation
Enhance percentage display
This fixes issue #598 where DIA-NN data processing would fail with certain datasets like PXD063291. Changes: 1. Fixed sdrf_pipelines import paths for compatibility with newer versions (0.1.0+) while maintaining backward compatibility with older versions. The module structure changed from sdrf_pipelines.openms to sdrf_pipelines.converters.openms. 2. Added robust column existence checks in dia_utils.py: - _prepare_quant_table_data: Now checks for required columns and falls back to Precursor.Quantity if Precursor.Normalised is missing - create_peptides_table: Handles missing Q.Value column gracefully - create_protein_table: Uses correct intensity column dynamically - draw_diann_quant_table: Skips tables gracefully when data is missing - _merge_condition_data: Uses correct intensity column for merging 3. Improved error handling to log warnings instead of raising exceptions when optional data is missing, allowing reports to complete with available data. https://claude.ai/code/session_01VrgNLeqdhEHweocVpUmcRT
Keep using the original import paths from sdrf_pipelines (pre-0.1.0 version): - sdrf_pipelines.openms.openms.UnimodDatabase - sdrf_pipelines.openms.openms.OpenMS The column check fixes for DIA-NN processing remain in place. https://claude.ai/code/session_01VrgNLeqdhEHweocVpUmcRT
Added column existence checks to prevent crashes when DIA-NN reports are missing expected columns: - _process_diann_statistics: checks for Protein.Group, Modified.Sequence - _process_peptide_search_scores: checks for Modified.Sequence, Q.Value - _process_modifications: checks for Modified.Sequence - _process_run_data: checks for Run, Modified.Sequence, Modifications, Protein.Group - draw_dia_ids_rt: checks for Run, RT Functions now return safe default values and log warnings when columns are missing, allowing reports to complete with available data. https://claude.ai/code/session_01VrgNLeqdhEHweocVpUmcRT
Add robust error handling and fallback logic to DIA-NN module
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughThis PR adds input validation and defensive returns to DIA-NN report processing, introduces intensity column handling with fallback logic across data structures, and updates CPSwitch configuration flags in plotting functions to improve data integrity and control flow. Changes
Sequence DiagramsequenceDiagram
participant Report as DIA Report
participant Parser as parse_diann_report
participant ModProc as _process_modifications
participant RunProc as _process_run_data
participant QuantTable as _prepare_quant_table_data
participant Condition as _merge_condition_data
participant PlotFunc as create_peptides_table/protein_table
Report->>Parser: Input report data
Parser->>ModProc: Process modifications
alt Modifications present
ModProc-->>Parser: Success (sets Modifications)
else Modifications missing
ModProc-->>Parser: Returns False (warning logged)
end
alt Modifications succeeded
Parser->>RunProc: Process run data
RunProc-->>Parser: cal_num_table_data (with validation)
else Modifications failed
Parser->>Parser: Use placeholder cal_num_table_data
end
Parser->>QuantTable: Prepare quantification table
QuantTable->>QuantTable: Check Precursor.Normalised or fallback to Precursor.Quantity
QuantTable->>QuantTable: Store intensity_col in attrs
QuantTable-->>Parser: report_data (with intensity_col in attrs)
Parser->>Condition: Merge condition data
Condition->>Condition: Read intensity_col from attrs (with fallback)
Condition->>Condition: Store intensity_col in cond_report_data.attrs
Condition-->>Parser: cond_report_data
Parser->>PlotFunc: Create quantification plots
PlotFunc->>PlotFunc: Retrieve intensity_col from attrs
PlotFunc->>PlotFunc: Calculate Average Intensity using dynamic column
PlotFunc-->>PlotFunc: Generate plots or skip with warning if data insufficient
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@pmultiqc/modules/common/dia_utils.py`:
- Around line 160-165: The callers pass pep_plot from _process_diann_statistics
into draw_num_pep_per_protein and will crash if pep_plot is None; add a guard at
the start of draw_num_pep_per_protein to check for pep_plot is None, log or warn
about missing peptide plot data, and return a safe value (e.g., None or empty
figure) instead of accessing pep_plot.dict["data"]. Update
draw_num_pep_per_protein (and any closely related plotting helpers) to
early-return when pep_plot is None so quantms.py and diann.py no longer need to
change their call sites.
| def _process_diann_statistics(report_data): | ||
| """Process DIA-NN statistics and create peptide plot.""" | ||
| required_cols = ["Protein.Group", "Modified.Sequence"] | ||
| if not all(col in report_data.columns for col in required_cols): | ||
| log.warning(f"Missing required columns for statistics: {[c for c in required_cols if c not in report_data.columns]}") | ||
| return 0, 0, None |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for parse_diann_report usage and context
rg -n --type=py -C5 'parse_diann_report' -g '!*test*'Repository: bigbio/pmultiqc
Length of output: 3955
🏁 Script executed:
#!/bin/bash
# Get the full parse_diann_report function implementation
sed -n '32,150p' pmultiqc/modules/common/dia_utils.pyRepository: bigbio/pmultiqc
Length of output: 4386
🏁 Script executed:
#!/bin/bash
# Get the _process_diann_statistics function
sed -n '160,200p' pmultiqc/modules/common/dia_utils.pyRepository: bigbio/pmultiqc
Length of output: 1759
🏁 Script executed:
#!/bin/bash
# Check how pep_plot is used in quantms.py after assignment
rg -n --type=py 'self.pep_plot' pmultiqc/modules/quantms/quantms.py -A3 -B1Repository: bigbio/pmultiqc
Length of output: 1837
🏁 Script executed:
#!/bin/bash
# Check how pep_plot is used in diann.py after assignment
rg -n --type=py 'self.pep_plot' pmultiqc/modules/diann/diann.py -A3 -B1Repository: bigbio/pmultiqc
Length of output: 420
🏁 Script executed:
#!/bin/bash
# Find the function that receives pep_plot
rg -n --type=py 'def.*draw_num_pep_per_protein' -A5Repository: bigbio/pmultiqc
Length of output: 476
🏁 Script executed:
#!/bin/bash
# Check if there's any None-checking for pep_plot before calling draw_num_pep_per_protein
rg -n --type=py 'draw_num_pep_per_protein' -B5 -A1 pmultiqc/modules/diann/diann.pyRepository: bigbio/pmultiqc
Length of output: 623
🏁 Script executed:
#!/bin/bash
# Check the full draw_num_pep_per_protein function to see how pep_plot is used
sed -n '1001,1030p' pmultiqc/modules/common/plots/id.pyRepository: bigbio/pmultiqc
Length of output: 901
🏁 Script executed:
#!/bin/bash
# Check the same in quantms.py for draw_num_pep_per_protein usage
rg -n --type=py 'draw_num_pep_per_protein' -B10 -A5 pmultiqc/modules/quantms/quantms.pyRepository: bigbio/pmultiqc
Length of output: 1802
🏁 Script executed:
#!/bin/bash
# Check if there's any None-check around the draw_num_pep_per_protein calls in quantms.py
sed -n '350,365p' pmultiqc/modules/quantms/quantms.pyRepository: bigbio/pmultiqc
Length of output: 648
🏁 Script executed:
#!/bin/bash
# Check the broader context around draw_num_pep_per_protein in quantms.py
sed -n '340,365p' pmultiqc/modules/quantms/quantms.pyRepository: bigbio/pmultiqc
Length of output: 1039
🏁 Script executed:
#!/bin/bash
# Check broader context in quantms.py around draw_num_pep_per_protein calls
sed -n '320,365p' pmultiqc/modules/quantms/quantms.pyRepository: bigbio/pmultiqc
Length of output: 1888
🏁 Script executed:
#!/bin/bash
# Double-check if there's any conditional logic around draw_num_pep_per_protein calls
sed -n '355,365p' pmultiqc/modules/quantms/quantms.py && echo "---" && sed -n '185,195p' pmultiqc/modules/diann/diann.pyRepository: bigbio/pmultiqc
Length of output: 688
Callers do not handle pep_plot=None gracefully—this will crash at runtime.
When required columns are missing, _process_diann_statistics returns None for pep_plot. Both quantms.py (line 358) and diann.py (line 187) pass this directly to draw_num_pep_per_protein, which immediately accesses pep_plot.dict["data"] (id.py:1006) without a None check. This causes an AttributeError if column validation fails.
Add a guard in draw_num_pep_per_protein or skip the call when pep_plot is None.
🤖 Prompt for AI Agents
In `@pmultiqc/modules/common/dia_utils.py` around lines 160 - 165, The callers
pass pep_plot from _process_diann_statistics into draw_num_pep_per_protein and
will crash if pep_plot is None; add a guard at the start of
draw_num_pep_per_protein to check for pep_plot is None, log or warn about
missing peptide plot data, and return a safe value (e.g., None or empty figure)
instead of accessing pep_plot.dict["data"]. Update draw_num_pep_per_protein (and
any closely related plotting helpers) to early-return when pep_plot is None so
quantms.py and diann.py no longer need to change their call sites.
Pull Request
Description
Brief description of the changes made in this PR.
Type of Change
Summary by CodeRabbit
Release Notes
Bug Fixes
Style